-
Notifications
You must be signed in to change notification settings - Fork 78
chore: add DoclingDocument validation rules #349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ DCO Check Passed Thanks @vagenas, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
🟢 Require two reviewer for test updatesWonderful, this rule succeeded.When test data is updated, we require two reviewers
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Hi Everyone, I'm using Docling's basic OCR to convert a PDF into text, and I save the output as a JSON file. In one of the PDFs I'm processing, the JSON result includes a group (Group 4), which lists its children as references to texts 61 through 77. Here's a simplified version of what the original JSON looks like:
However, after loading this JSON using:
…the children list in Group 4 unexpectedly changes. Here's what I get after validation:
Notably: groups/44 and groups/45 were inserted automatically, texts/76 and texts/77 are now missing from the group. Is this due to the list_item labels (texts/69 and texts/71) not being inside a proper list group? I’ve seen references in the source where validation enforces list_item to be wrapped in a ListGroup, and I'm wondering if this restructuring is caused by that. It seems simliar error to the above pull request Is there a recommended way to handle this — should I be wrapping any list_item in a separate group before validation to prevent auto-generated groups? Happy to provide a more complete sample if needed. Thanks in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Signed-off-by: Panos Vagenas <pva@zurich.ibm.com>
e5cfec4
b82af7e
to
e5cfec4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
No description provided.